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With an increasing demand for power-hungry data-intensive computing, 
design methodologies with low power consumption are increasingly gaining 
prominence in the industry. Most of the systems operate on critical and non- 
critical data both. An attempt to generate a precision result results in 
excessive power consumption and results in a slower system. For non- 
critical data, approximate computing circuits significantly reduce the circuit 
complexity and hence power consumption. In this paper, a novel 
approximate single precision floating point adder is proposed with an 
approximate mantissa adder. The mantissa adder is designed with three 8-bit 
full adder blocks. In this paper, a detailed mathematical background, and 
proposed design approach in terms of the circuit configuration and truth 
tables are discussed. Additionally, a concept of switching between exact 
computing and approximate computing is analysed considering an 
approximate carry look-ahead adder. The delay and power consumption for 


adder the exact operating mode and approximate operation mode considering 
varied window sizes is observed. Performance of the approximate 
computation is compared against exact computation and varied approximate 
computing approaches. 
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1. INTRODUCTION 

Battery-operated portable electronic devices have increasingly become an indispensable part of 
everyday life. The key behind this is the scaling ability of metal-oxide-semiconductor field-effect transistor 
(MOSFETS) seen in very large-scale integration (VLSI) due to which functionality per unit area has 
increased which has brought the price of the devices down leading to wide usage. Due to scaling and an 
increase in functionality per unit area, the power consumption has increased. Even though massive 
developments and innovations are observed in the last few years in digital integrated circuits, high power 
consumption remains the main issue. Due to the massive demand for performance-oriented circuits, power 
consumption becomes an important feature for appropriate circuit design. However, the increase in power 
consumption of the VLSI devices has not been matched by the improvement in the capacity of the battery. 
Therefore, operation time per charge has come down causing inconvenience to the users. For this reason, 
reducing the power consumption of portable devices has become a compelling design constraint. 
Furthermore, achieving desired performance trade-off between energy efficiency and reliability in portable 
digital processing systems and power circuits has become a massive challenge. To build an arithmetic and 
logical unit, adders are considered a key building block. Adders also provide massive support to varied 
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operations like multiplication, division, and subtraction and are considered one of the most power-hungry 
circuit components and remained in hot-spot locations [1]. 

Here, a large portion of energy consumption is dominated by two components: dynamic power and 
leakage power. To extend the battery life various technology-based, architecture-based, and circuit-based 
solutions that reduce the sum of the two power components without sacrificing the performance have to be 
developed. As a result, based on extensive research, varied design methods are presented over the years to 
meet power, and speed design requirements. Among them, one of the most emerging approaches is 
approximate computing to handle the market demands and maintain performance with high power efficiency. 
However, computational accuracy gets affected in the lowest manner [2]. This approach is built specifically 
for the application in which the set of approximate answers is acceptable. 

At the technology level, feature size scaling has continuously brought lower power circuits by 
reducing the supply voltages. To retain performance, the threshold voltages of these circuits have also been 
reduced with technology scaling. However, in recent technologies, the benefits of constant-field scaling have 
been compromised by an exponential increase in the leakage current. On the architectural level, pipelining 
and parallelism have helped in lowering the power consumption of digital circuits. Besides, in the current 
complementary metal-oxide-semiconductor (CMOS) technology, the benefits of device scaling are impeded 
by reliability issues due to process variations, aging effects, and soft errors. Leakage current and static power 
are increasingly adding to the concerns about achieving low power consumption. Hence, the device scaling 
which once used to offer advantages for low-power applications is no more attractive and hence, new 
architectures need to be evolved to achieve low power consumption. Thus, a new paradigm, the design of 
approximate computation blocks is introduced to help in providing the simplification of the arithmetic unit 
circuits [3]. Many researchers have provided their efforts in designing approximate adders with varied 
structures [4]-[14]. However, most of the provided adders are approximate and massively advantageous for 
the applications like error resilient. On the other side, in these proposed adders a constant level of deviation is 
observed from the actual result. This shows that at the time of system processing, the accuracies are not 
tunable. Therefore, accuracy re-configurability can be an important and advantageous feature at the time of 
processing (Runtime) with varying levels of quality of service at the time of operations [4]-[6]. By 
compromising quality, the performance of computational time and power consumption can be minimized. As 
a result, energy efficiency can be improved. 

Furthermore, most of the modern graphics processors for multimedia and other applications have 
dedicated digital signal processing (DSP) blocks. These applications output an image, video, or audio signal 
and the limited perception of human senses allows for an approximation of the computations involved in the 
demanding DSP algorithms for these applications [2]. Even an analog computation that yields good enough 
results instead of accurate results are also acceptable [4]. The addition is the most fundamental and 
significant mathematical operation used in all signal/ image processing applications [5], [6]. Deterministic 
approximate logic or probabilistic imprecise arithmetic are normally employed for soft adders [7]. In 
addition, Low-power designs through approximate computing have been proposed using Algorithmic Noise 
Tolerance [15], [16], non-uniform voltage over scaling [17], and significance driven computation [18], [19]. 
Verma et al. [7] proposed a novel adder design that is exponentially faster than traditional adders called an 
almost correct adder (ACA) and a variable latency speculative adder (VLSA) with an area overhead. Other 
adder configurations meet the real-time energy requirements by complexity reduction at the algorithm level 
[20], [21]. The lower part or adder [22] is based on an approximate logic with a different truth table than that 
of an original adder. Probabilistic full adder (PFA) [23], [24] is based on probabilistic CMOS, a technology 
platform for modeling the behavior of nano-metric designs as well as reducing power consumption. 

Additionally, for both exact computing and approximate computing, general-purpose processors are 
also utilized in some digital systems. In these processors, mandatorily dynamic switching capabilities are 
induced to switch between exact and approximate computing. A correction unit can be added to the design 
circuit for incorporating this feature so that switching between approximate and exact computing can be 
performed easily. However, the delay, power, and design area overhead can be increased by the incorporation 
of a correction unit. On the other side, processing can be slowed down by the adoption of an error correction 
circuit, which requires more than one clock cycle [7], [8]. 

Therefore, this paper highlights the designing of an approximate adder and the performance 
comparison of an approximate adder with exact computing. Furthermore, a novel approximate single- 
precision floating-point adder is proposed with an approximate mantissa adder. The mantissa adder is 
designed with three 8-bit full adder blocks. Additionally, in this paper, an approximate carry look-ahead 
adder is presented which works based on the concept of switching between exact and approximate 
computing. Thus, the design circuit has an exact and approximate operating mode. The proposed adder 
structure does not require any kind of error correction unit and works on the principle of the traditional carry 
look-ahead adder for switching between exact and approximate computing. The delay and power 
consumption for the exact operating mode in the proposed adder methodology is kept similar to the 
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traditional carry look-ahead adder. However, in approximate computing, the delay and power consumption 
are relatively lesser than in exact computing. The power consumption is significantly minimized in the 
approximate operating mode by exploiting the power gating technique. The performance of the proposed 8- 
bit approximate adder is evaluated in two different parts in which first part concentrates on obtaining metrics 
results like delay, error metrics, and power consumption trade-off in terms of truth tables, circuits, 
performance values, and results are compared with exact computing outputs. In the second part, outputs are 
compared against varied previous approximate paradigms in terms of error percentage, mean error detection, 
delay, power consumption, and normalized error detection considering different window sizes. 

This article is presented in the following manner. Section 2, describes related work and discusses the 
design challenges of the approximate adder, and section 3 describes the methodology to design approximate 
adders for low-power applications. Section 4 describes the experimental results and section 5 concludes the 


paper. 


2. RELATED WORK 

Despite the massive advancements in varied semiconductor technologies, low-power design 
applications, and power-optimization methodologies, several computing devices, and processing machines 
require high power to handle large-data processing and computations. This problem becomes more 
challenging in the case of mobile and internet of things (loT) devices or battery-oriented systems. Thus, 
several researchers have shown great interest over the years to provide power-handling mechanisms in data 
processing layers, especially for these mentioned devices [22], [25]. Data is gathered from varied computing 
devices from various domains in huge numbers and analysis of those data is challenging. However, 
techniques like data mining, synthesis, and some recognition or analysis applications, error-resistant 
applications are used to understand meaningful data [2]. One of the approaches to handle this kind of issue is 
inexact solutions such as approximate computing. This approach has massively emerged in the last few years 
to minimize high power consumption and enhance system efficiency [26]. This approach can be utilized in 
many areas at varying levels such as in numerous devices, hardware systems, software, programming 
languages, algorithms, design architectures, and circuit designs. This approximate computing paradigm can 
be useful for the development of automated design and assessment. The base of approximate computing is its 
design rules like significance-guided design. This can be extended to a specific level of design based on a 
particular application. 

Due to massive interest from the electronic and semiconductor industries, the approximate 
computing approach has come out as one of the most widely adopted power control paradigms in recent 
years. Thus, a massive number of researchers have provided detailed analyses and studies based on the 
generalized significance-guided design principles with concentrates on fewer resource utilization and lower 
computational complexity. Based on this principle, voltage scaling in CMOS design and logic circuit 
reduction is applied. Providing high supply voltages to the critical circuits can minimize power consumption 
in Probabilistic CMOS VLSI circuits and accuracy management of the most significant bits is also ensured. 
At the same time, the supply voltage is lowered for the least important bits to save power. In approximate 
computing, accuracy may be compromised compared to exact computing but in the least manner. For an 
instance, using a conventional adder, a probabilistic adder is constructed whose supply voltage depends upon 
the level of importance [9]. However, implementation cost can be a massive issue in this approach due to the 
complexity of supply voltage control. Thus, most of the approximate computing circuits are focused on logic 
reduction design and the pruning method can be used to control and implement. 

Furthermore, floating-point adders have captured large attention in recent years. Liu et al. [27] have 
presented a floating-point adder design to improve critical path delay and enhance the efficiency of image 
processing applications. Camus et al. [28] have presented an approximate floating-point adder for image- 
processing applications by combining an inexact speculate adder approach and gate-level pruning 
methodology to build an approximate multiplier architecture. In this paper, three approximate units are 
presented. This unit utilizes tone mapping to enable high dynamic-range images. Yan et al. [29] have 
presented an approximate floating-point multiplier for the customization of the machine-learning framework 
based on the k-nearest neighbors (KNN) algorithm and this concept is used for the application of handwritten 
digit recognition. The computing processors where large data processing is required, high power is dedicated 
to those processing blocks. The main processing blocks for adder design are arithmetic and logic units 
(ALUs) [30]. Thus, overall power consumption may improve by minimizing the necessary power in an adder 
unit. 
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3. PROPOSED METHOD FOR APPROXIMATE DESIGN 
3.1. Basic building block: 8-bit approximate adder 

The carry equation for a conventional carry look-ahead adder is given by (1). Where M;,, is the input 
carry and Rẹ and L, propagate and generate signals of the k" stage. If the carry equation is split up into two 
segments, as in (2). 


Myas = Ly + Ly-aRy + °° + Lo Mina Ra + Min Theo Ra (1) 


Maa = (Dins a a Re) + (SEM a r Re) + Min Mico Ra) (2) 


Where, N is the window size, the first segment consists of N most significant (MS) bits and the 
second segment consists of N-W least significant (LS) bits. The first part of (2) is the approximate part, while 
the second part is called the augmenting part. For approximate carry generation with a window size of N, the 
output carry at the k" stage is computed using the approximate part only. Computing an approximate Mx, is 
faster and consumes fewer hardware resources and hence lesser power as compared to computing precise 
carry. 

It is explained in varied traditional methods that the M,,,, processing of approximate computing is 
relatively faster with minimum power consumption than the M,,, processing of exact computing. In the 
proposed reconfigurable adder, two types of operating modes are discussed such as exact and approximate 
operating modes based on this M;, processing. However, only one multiplexer is utilized in the proposed 
computing methodology M;,, than the traditional computing methods. Multiplex consists of input and output 
mechanisms in which input signals are approximate and output signals are considered as augmented carry 
signals as the selector. The carry output generation is computed using operating mode signals in either the 
exact computing or approximate computing. The gate-level computation is performed using the proposed 
methodology as demonstrated in Figure 1. The power consumption of augmenting region is handled using the 
power-gated p-channel metal-oxide semiconductor (p-MOS) headers when the carry computation block is in 
the approximate computing. All the computing signals of processing units are handled using only one signal 
when all the adder carry is generated using the proposed design structure and methodology. However, all the 
adder output bits may remain imprecise. The approximate adder accuracy is enhanced by utilizing the MS bit 
group of the exact carry generator as demonstrated in [10] and [20]. This approach can be used in designing 
the carry look ahead adder with multiple accuracy levels. Adders are subdivided into a few different 
segments for the approximate and exact operating modes to design adjustable accuracy composed structures. 
Thus, in this report, efficiency computation for approximate and exact computing is discussed. 
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Figure 1. Carry generator for 4-bit block 


3.2. Accuracy analysis of approximate adder 

The approximate block accuracy is studied based on the error metrics and analytical expressions are 
analyzed using the proposed adder methodology. This section discusses the analysis of approximate adder 
accuracies compared with the traditional approximate adders. In the proposed ripple carry adder design, the 
generated approximate carry M, decides adder accuracy for every bit position. The (Mx) function evaluates 
the exactness value of the approximate carry M, for k — th position based on the input signals. BCM;) 
function is evaluated using the following function given in (3). Where n is the window size and B(M;) € 
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{0,1}. Then, the error probability of the k — th obtained approximate carry is evaluated by the (4). Then, the 
error rate for the proposed model considering the n-bit ripple carry adder is evaluated as given in (5). 


BM) = Tlézken Ra x [ for Lin Was Re ~ Lk-n-1] (3) 
n+1 k-n-d-1 

RMD = (3) East (3) (a) 

R(error(n, s)) = ies (CDH ieee nce |R(Mk1) N N R(M;c)|) (5) 


After error probability estimation, error detection is an important aspect of the accuracy analysis for 
the proposed adder mechanism. Using (3), the error detection metric for the proposed approximate adder 
mechanism is given by the (6). 


ED(X,Y,n,s) = Xizn+ı 2” (1) B (M;) + 25B (M5) (6) 


Then, the normalized error detection metric is given by the (7), 


_ 1 22n EDk 
NED = 5 ye, ™ (7) 

where the maximum error value is given by J. Then, the average relative error detection metric for 

approximate adder is given by the (8). 
1 «22n |EDkl 


ARED = 335 Diet g (8) 


3.3. IEEE-754 floating point format 

A floating-point representation provides a higher dynamic range than a fixed-point representation of 
real numbers. The floating-point hardware is both complex and consumes significant power. The most 
commonly used standard for the floating point (FP) format is the IEEE 754-2008 [31]. There are basic and 
extended types that are supported by this standard: half-precision (16 bits), single precision (32 bits), double 
precision (64 bits), extended precision (80 bits), and quad precision (128 bits). A general IEEE FP format is 
shown in Figure 2. The exponent part has a bias of 2*!-1, where E is the number of exponent bits. The 
single-precision and double-precision formats are mostly used in today’s computers. Table 1 demonstrates 
exponent and mantissa bits for IEEE-754 basic and extended floating-point types. 


exponent | mantissa 


FP No = (—1)° x 2°*Ponent-bias x (1 + mantissa) 


Figure 2. General IEEE-754 floating-point format 


Table 1. Exponent and mantissa bits for IEEE-754 basic and extended floating point types 


Type Sign bit Exp. bits Mant. bits Total Mant. bits/total 
Half 1 5 10 16 62.50% 
Single 1 8 23 32 71.9% 
Double 1 11 52 64 81-2% 
Extended 1 15 64 80 80.0% 
Quad 1 15 112 128 87.5% 


3.4. Floating point adder architecture 

A generic FP adder architecture includes hardware blocks for exponent comparison, mantissa 
alignment, mantissa addition, normalization, and rounding of the mantissa (shown in Figure 3 and detailed in 
[30], [32], [33]). Two operands are first unpacked from the FP format, and each mantissa is added to the 
hidden '1' bit. The addition of FP numbers involves comparing the two exponents, and adding the two 
mantissae; the exponents are first evaluated to find the larger number. The mantissa is then swapped 
according to the exponent comparison; they are then aligned to have an equal exponent prior to the addition 
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in the mantissa adder. Following the addition, normalization shifts are required to restore the result to the 
IEEE standard format. The normalization is completed by left shifting with a number of leading zeros; 
therefore, leading zero detection is a key step for normalization. Rounding the normalized result is the last 
step before storing back the result; special cases (such as overflow underflow, and not a number) are also 
detected and represented by flags. 
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Figure 3. Floating point adder algorithm 


3.5. Approximate floating-point adder 

The approximate FP adder design originates at the architecture level with the exponent and mantissa 
adder/substractor designed using approximate fixed-point adders. An N-bit adder consists of two parts, i.e., 
an m-bit exact adder and an n-bit inexact adder (Figure 4). The exact adder part can have the exact 
implementation as a full adder circuit. The inexact adder will ignore the carry bits for computation thereby 
reducing the critical path as well as the hardware utilization. The modified approximate adder concept can 
also be used for the mantissa adder for approximate computation. The mantissa adder will provide a larger 
scope, as the number of bits in the mantissa are higher than the exponent and at the same time, the 
approximate design in the mantissa adder has a lower impact on the error, because the mantissa part is less 
significant than the exponent part. Therefore, an inexact design of a mantissa adder is more appropriate. 
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Figure 4. Approximate adder concept 


An 8-bit adder is chosen as the basic building block for the FP approximate adder in the proposed 
design. As shown in Figure 5, the 8 bits are partitioned into two blocks, the MS block is of 4 bits, while the 
LS block is of 4 bits. Figure 6 shows the schematic diagram of the conventional full adder. 
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MSB | | LSB 


Figure 6. Conventional full adder 


In the proposed adder, for the LS, W-bit window sum and carry are computed as per (9). The 
schematic of 1-bit full adder in the proposed configuration is given in Figure 7. The sum and carry for most 
significant (8-W=4) bits are computed as for an exact adder given in (10). The truth table for the proposed 
sum and carry equations is given in Table 2. 

Ci+1 = dibi + Cin 

Si = Girt (9) 

Cita = aibi + Cin (a; B Bj) 


Si = (a; BD; B cin) (10) 


Figure 7. Schematic for carry and sum generator of proposed adder 


Table 2. Truth table for the proposed sum and carry 
Proposed Exact 


A B Ca $ Ga S Cu 
0 0 0 1 0 0 0 
0 0 1 0 1 1 1 
0 1 0 1 0 1 1 
0 1 1 0 1 0 0 
1 0 0 1 0 1 0 
1 0 1 0 1 0 0 
1 1 0 0 1 0 0 
1 1 1 0 1 1 1 
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Overall, 3 errors are introduced in sum computation and 1 error in carry computation. Assigning the 
inverted carryout at each stage to the sum computed for that stage reduces the hardware for the sum 
computation block. This is a significant reduction in hardware requirements as compared to a conventional 
adder. Utilizing the look-ahead carry generation logic from 4 MS bits improves the timing performance of 
the circuit by not depending on the sequential computation of carrying at each bit. A total of 8 transistors are 
used for 1-bit sum and carry generation. 


3.6. Mantissa approximate adder 

For realizing the 23-bit approximate adder, three 8-bit adders are used. The lower two 8-bit adders 
are the proposed 8-bit approximate adders, while the MS byte is implemented using an exact 8-bit adder. The 
proposed 23-bit mantissa adder is shown in Figure 8. 


9, 
Sazi Sss Sh:o 


Figure 8. Proposed 23 bit mantissa adder 


3.7. Exponent adder/substractor 

As the exponent for realizing the 23-bit approximate adder, is having the most impact on the 
accuracy of the result. The exponent adder is proposed to be implemented using an exact 8-bit adder. The 
maximum ED for the 8-bit adder is 3. 


4. RESULT AND DISCUSSION 

The proposed adder circuit is simulated in a Cadence environment for delay and power consumption 
and error analysis. The results are presented hereby. Several metrics are proposed to elaborate the 
performance discussion. 


4.1. Performance discussion for approximate floating-point adder 
4.1.1. Error metrics for proposed 8-bit adder 

The proposed 8-bit adder with a window size of W=4 is simulated for all possible input 
combinations of a and b. For all the 256x256 combinations, the approximate and the exact sum and carries 
are computed, and error distances are computed between the approximate and accurate outputs. The 
maximum ED for the 8-bit adder is 3. 


4.1.2. Delay 

Considering a conventional 8-bit adder, the delay in 8-bit computation is due to the ripple carry 
effect, which takes 8 cycles. Assuming the delay in the computation of the 1-bit full adder result to be T, the 
delay in generating the 8-bit adder is 8 T. In the proposed adder, the total delay is equal to the delay for 
computation of carry out from MS 4-bits, which is equal to 4 T. 


4.1.3. Power consumption tradeoff 

The energy consumed by a probabilistic inverter increases exponentially with the probability of 
correct output. The power consumption is considered proportional to the number of gates in an approximate 
implementation. In the proposed adder circuits, the reduction in the number of transistors allows a lower 
operating voltage reducing the overall power consumption. For a conventional full adder, if the power 
consumed for 1-bit is normalized to 1, then the power consumed for a k-bit conventional full adder is k. For 
the proposed 8-bit adder, the operating voltage is reduced to 1.04 V from 1.13 V for accurate implementation 
due to a reduction in the number of transistors. Hence, the estimated power consumption is evaluated by (11), 


1.042 1.042 


Es_pit =W X 113 +(8-W)=4x ise 


+4 (11) 


which is 7.5% lower than the conventional adder. 
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A detailed analysis of full adder truth table is carried out and observed that Sum = Cput for 6 times 
out of all possible 8 combinations, except when the value of A, B, Cin is kept at O and 1 for all, respectively. 
Further, in the traditional approximate adder as shown in Figure 9, C,,,, is evaluated in the first stage. Then, 
one way to simplify the traditional approximate adder is by discarding the Sum circuit. Thus, if Sum = Cout 
is set in the traditional approximate adder, then the amount of capacitance present in the Sum circuit will be a 
mixture of 4 source-drain diffusion and 2 gates capacitance. As a result, there is a massive increase in terms 
of total capacitance compared to the traditional approximate adder. However, it will cause a delay where two 
or more approximate adders are cascaded with each other. As mentioned, that Sum = Cout for 6 times out of 
all possible 8 combinations, thus the simplified approximate adder is cascaded for C,,,, as demonstrated in 
Figure 10. Moreover, Figure 11 shows the simplified approximate adder circuit using the mentioned 
approach. As a result, 3 errors in Sum and | error in Cput is generated as demonstrated in Table 3. 


V 
VDD VDD DA 7 VDD 7 VDD 
A{C C Adc B4LCin dL P Bd_ B4C Ad4L Bd ypp 
JPB AE AĮ Da p Cin 
d Jb Cin Cin 4 Sum 
cin |} Cout’ t Sum’ "p Cout’ L HCin 
AJL -C jE Cin m r ] L 
A B ~j A 
AJC B4CB4C A4C B4 cin ff JHA J J le E 
es + +B 
= = JHB : : B 
Figure 9. Traditional approximate adder Figure 10. Simplified approximate adder 


Table 3. Truth table for traditional full adder and approximations 
Inputs Exact outputs Approximate outputs 


A B Cin Sum Cout Sum, Coutt Sumy Cout2 Sum Couts 
0 0 0 0 1 0 0 0 0 0 
0 0 1 1 0 1 0 1 0 0 0 
0 1 0 1 0 0 1 0 0 1 0 
0 1 1 0 1 0 1 1 0 1 0 
1 0 0 1 0 1 0 0 1 0 1 
1 0 1 0 1 0 1 0 1 0 1 
1 1 0 0 1 0 1 0 1 1 1 
1 1 1 1 1 0 1 1 1 1 1 
VDD 
Pax a 
Bd_ Bd 
| VDD 
EaD ME 
P Sum 
Cin Lp Cour, 
7 | 
AW B+ = 


Figure 11. simplified approximate adder circuit 


Both the discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) blocks 
operates at a lower supply voltage in case of approximate adders than the exact mode. Here, DCT and IDCT 
operates at a supply voltage of 1.28 V and 1.13 V in the exact mode, respectively. The different supply 
operating voltages are demonstrated in Figures 12 and 13 for different approximations and truncations 
considering varied bits. Table 4 demonstrates the percentage power savings considering varied 
approximations and truncation against the base case. Approximation 3 saves the maximum power. 
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Table 4. Percentage power savings for approximations over the base case 
Technique 7LSB’s 8 LSB’s 9 LSB’s 
Truncation 48.22 56.23 61.24 
Approximation 1 37.86 50.85 55.26 
Approximation2 41.21 49.13 53.84 
Approximation 342.46 52.64 59.23 
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Figure 12. Operating voltages considering different bits for DCT technique 


Operaing Voltages for IDCT technique 
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Figure 13. Operating voltages considering different bits for IDCT technique 


4.2. Performance metrics comparison for approximate carry look-ahead adder 

Here Gy is defined as the accurate result for the k — th input set. The performance of the proposed 
adder design is compared with previous works such as [23], [24] in terms of performance metrics like 
percentage error rate, average error detection, and normalized error detection. The adder design structure 
presented in [24] is called generic accuracy configurable (GeAr), and in [23] is called ethylene recovery unit 
(ERU). However, the ERU design structure is segregated into two design structures i.e. with an error 
reduction unit and without an error reduction unit. Their detailed description is presented in [23]. The 
proposed adder design structure is studied considering 8-bit added for varied window sizes while in [23], the 
window size is fixed as 2 k. Figure 14 shows the proposed 8-bit approximate adder design comparison in 
terms of error rate in percentage against GeAr considering different window sizes. It is evident from Figure 
14 results that the proposed accuracy of the proposed approximate adder design is higher than the [24]. Other 
possible combinations regarding error metrics for performance comparison are average error detection and 
normalized error detection. 

Figure 15 shows the proposed 8-bit approximate adder design comparison in terms of average error 
detection against [23], [24] considering three different window sizes. It is evident from Figure 15 results that 
the proposed adder design has a slightly lower average error detection value than GeAR and similar values to 
the ERU design structure in case of without the use of an error reduction unit. However, the least average 
error detection values are observed when an error reduction unit is utilized. Figure 16 shows the proposed 8- 
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bit approximate adder design comparison in terms of normalized error detection against [23], [24] 
considering three different window sizes. It is evident from Figure 16 results that the proposed adder design 
has slightly lower normalized error detection values than GeAR and similar values to the ERU design 
structure in cases of without and with the use of an error reduction unit. Thus, in terms of normalized error 
detection values, the proposed adder design performs well. 
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Figure 14. Error rate (%) comparison of proposed adder vs GEAR 
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Figure 15. Average error detection comparison of proposed adder against varied adder designs 
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Figure 16. Average error detection comparison of proposed adder against varied adder designs 
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Figure 17 demonstrates the performance results for the proposed 8-bit approximate adder design 
against varied adder designs in terms of delay (ps). It is visible from Figure 17 that an increase in the window 
size will increase delay for all the adder designs. This shows parameter values are different for different 
window sizes. However, the proposed approximate adder design shows the least delay among all the 
approximate adder designs. 
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Figure 17. Delay comparison considering proposed adder design against varied approximate adder designs 


Figure 18 demonstrates the performance results for the proposed 8-bit approximate adder design 
against varied adder designs in terms of area (um°). It is visible from Figure 18 that area is different for 
different window sizes. However, the proposed approximate adder design requires the least area among all 
the approximate adder designs. 


Area Comparison 


Area 


Eo 


YU 


i JE. 
10 . 
7 lim. 


ERU ener 
Varied Window Sizes 


sl 42 m4 


Figure 18. Area comparison considering proposed adder design against varied approximate adder designs 


Figure 19 demonstrates the performance results for the proposed 8-bit approximate adder design 
against varied adder designs in terms of power (uW). It is visible from Figure 19 that the proposed 
approximate adder design requires the least power among all the approximate adder designs. Overall 
proposed approximate adder design shows superior results than the other approximate adder designs. 

Figure 20 demonstrates the performance results for the proposed 8-bit approximate adder design 
against varied adder designs in terms of energy (aJ). It is visible from Figure 20 that the proposed 
approximate adder design requires the least energy among all the approximate adder designs. Overall 
proposed approximate adder design shows superior results than the other approximate adder designs. 
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Figure 19. Power comparison considering proposed adder design against varied approximate adder designs 
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Figure 20. Energy comparison considering proposed adder design against varied approximate adder designs 


5. CONCLUSION 

A novel approximate adder topology for single point floating-point adder is presented in the paper. 
The proposed design takes advantage of the fact that the lower significant bit addition can be approximate 
and this will not be affecting the solution to a great extent, at the same time the power savings due to the 
approximate computation will be significant. The proposed configurations have a lower propagation delay 
and comparable error performance as compared to other architectures. With the proposed mantissa adder, 
which is a hybrid of look ahead, carry adder for the carry generation and the approximate adder for the sum, 
generation gives a distinct advantage in terms of power consumption as compared to the conventional full 
adder. In addition, the concept of switching between exact and approximate computing is also discussed and 
a performance comparison between exact and approximate computing is presented. It is evident from the 
performance results that in the case of an approximate adder considering the larger window size, the power 
savings due to the approximate computation will be significant with minimum delay. The proposed 
configurations have a lower propagation delay and comparable error performance as compared to other 
architectures. The accuracy performance difference between exact and approximate computing remains 
negligible. Therefore, approximate computing can be utilized instead of exact computing in future DSP 
applications. In future work, significant research will be performed to improve on-chip power efficiency in 
approximate computing as a practical mainstream computing paradigm. 
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